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Abstract 

We consider the efficient estimation of the semiparametric additive transformation model 
with current status data. A wide range of survival models and econometric models can be incor- 
porated into this general transformation framework. We apply the B-spline approach to simulta- 
neously estimate the linear regression vector, the nondecreasing transformation function, and a 
set of nonparametric regression functions. We show that the parametric estimate is semiparamet- 
ric efficient in the presence of multiple nonparametric nuisance functions. An explicit consistent 
B-spline estimate of the asymptotic variance is also provided. All nonparametric estimates are 
smooth, and shown to be uniformly consistent and have faster than cubic rate of convergence. 
Interestingly, we observe the convergence rate interfere phenomenon, i.e., the convergence rates 
of B-spline estimators are all slowed down to equal the slowest one. The constrained optimiza- 
tion is not required in our implementation. Numerical results are used to illustrate the finite 
sample performance of the proposed estimators. 
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1 Introduction 



We consider the efficient estimation of the following semiparametric additive transformation model: 



H(U) = Z'p + Y t h j (W j ) + e, 



(1) 



where H(-) is a monotone transformation function, hj(-)'s are smooth regression functions (with 
possibly different degrees of smoothness), and e has a known distribution F(-) with support R. A 
wide range of survival models an d econometric models can be incorporated i nto the above general 
transformati on framework, e.g., (|Huang & Rossini[|l997HShenlll998uHuangl . ll999l : lBanerjee et al. . 
2006L |2009|). In particular, the model © can be readily applied to a failure time T by letting 
U = logT. We can obtain the partly linear additive Cox model, i.e., iHuangl (119991) . by assum- 
ing F(s) = 1 — exp(— e s ) and H{u) = logA(e u ), where A is an unspecified cumulative hazard 
function. Specifically, the hazard function of T, given the covariates (z, w), has the form 



\(t\z, w) = a(t) exp0'z + hj(wj)), 



(2) 



i=i 



where a{t) is the baseline hazard function, $ = —f3 and hj = —hj. However, if we change the 
form of F(s) to e s / (1 + e s ), the model (Q~|) just becomes the partly linear additive proportional odds 
model. 

Motivated by the close connection with survival models, we focus on the current status data in 
this paper which arises not only in survival analysis but also in demography, epidemiology, econo- 
metrics and bioassay. More specifically, we observe X = (V, A, Z, W), where V E R is a random 
examination time and A = 1{U < V}. We assume that U and V are independent given (Z,W). 
Under current status data, the model (Q3 is also related to the semiparametric binary model studied 
in econometrics. Using the link function F(-), we assume that the probability of A = 1, given the 
covariates (Z, W, V), is of the expression: 



P(A = 1\Z,W,V) =F [(3'Z + J2~hj( W j) + H ( v ) ) ■ 

3=1 



(3) 



Note that lBanerjee et al.l (|2006l) and lBanerjee et al.l (|2009) have done a great deal of statistical esti- 
mation and hypothesis testing on the model © (without hj terms) by assuming F(-) to be log-log 
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function and logistic function, respe ctively. An extensive dis cussions on the relation between © 



and survival models can be found in 



Doksum & Gaskol (11990I) . Recently a similar transformation 



model has been considered by IChen & Tongl (|2010|) but for the right censored data. They showed 
that the monotone transformation function is root-n estimable which will never be achieved in the 
case of current status data. This is the key theoretical difference between the two types of survival 
data. 

In this paper, we employ the B-spline approach to simultaneously estimate the vector (3, mono- 
tone H and smooth h ^s. The corresponding estimates are denoted as j3, H and hj. In contrast, 



Ma & Kosorokl (|2005|) apply the penalized NPMLE approach to CO) (with d = 1) which yields a 
non-smooth step function H and the penalized estimate h. Our B-spline framework has the follow- 
ing theoretical and computational advantages over the existing penalized NPMLE approach: 

1. Our B-spline estimate H is smooth and uniformly consistent. However, H is always discon- 
tinues (regardless of the smoothness of its true function H ) and has a bias which does not 
vanish asymptotically. More importantly, the convergence rate of our H (h) is shown to be 
faster than that of H (h), i.e., Op(n -1 / 3 ). Therefore, we expect more accurate inferences 
drawn from H (h). 

2. We are able to give an explicit B-spline estimate for the asymptotic covariance of (5 based 
on which the asymptotic confidence interval of (3 can be easily constructed. Under very weak 
conditions, its consistency is proven. However, the block jackknife approach in Ma & Kosorok 
(2005) requires more computation, and is even not theoretically justified. 

3. Our spline estimation algorithm requires much less computation than the isotonic type algo- 
rithm used in Ma & Kosorok (2005) since the order of jumps in the step function is supposed 
to be much larger than the order of knots we choose for estimating H and h/s. 

Despite the non-root-n convergence rates of H and h/s, we are able to show that f3 is root-n con- 
sistent, asymptotically normal and semiparametric efficient. We derive the efficient information 
bound by taking the general two-stage projection approach from Sasieni (1992) which is needed 
due to the involvement of multiple nonparametric functions in semiparametric models. Interest- 
ingly, we observe the convergence rate interfere phenomenon for the B-spline estimators, i.e., the 
convergence rates of nonparametric estimators are all slowed down to equal the slowest one. More- 
over, by approximating log if with the B-spline, we can avoi d the monotonicity constraint in the 
implementation, which is usually required in the literature, e.g. JZhang et al.l (|2010h . 



3 



The remainder of the paper is organized as follows. Section |2] describes the B-spline estimation 
procedure. The asymptotic properties such as consistency and convergence rates of the estimates 
are obtained in Section [3] The asymptotic distribution of the parametric component is studied in 
Section HI and its efficient information and the corresponding explicit B-spline estimate are given in 
Section [5] Simulation studies are presented in Section 16.11 We close with an appendix containing 
technical details. 

2 Semiparametric B-spline Estimation 

2.1 Assumptions 

We first define some notations. For any vector v, v® 2 = vv'. The notations £ and ^ mean greater 
than, or smaller than, up to a universal constant. We denote A n x B n if A n £ B n and A n £ B n . 
The notations P„ and G n are used for the empirical distribution and the empirical process of the 
observations, respectively. Furthermore, we use the operator notation for evaluating expectation. 
Thus, for every measurable function / and true probability P, 

Pn/ = -^/M, Pf= fdP and GJ = T V(/(I,)-P/). 

We next present some model assumptions. 
Ml. U and V are independent given (Z, W). 

Ml. (a) The covariates (Z, W) are assumed to belong to a bounded subset in W +d , say [0, 1]' x 
[0, l] d . The support for V is [l v ,u v ], where — oo < l v < u v < +oo; (b) The joint density 
for (Z, V, W) w.r.t. Lebesgue measure stays away from zero, and the joint density for (V, W) 
stays away from infinity. 

M3. E(Z - E(Z\V, W)) m is strictly positive definite. 

M4. The residual error distribution F(-) is assumed to be known and has support IR. Denote the 
first, second and third derivative of F as /, / and /, respectively. We assume that (a) (f(u) V 
\f(u) | V \f(u) |) < M < oo over the whole IR and f(u) stays away from zero in any compact 
set of R; (b) [f 2 (v) - f(v)F(v)] A [f 2 (v) + f(v)(l - F(v))\ > 0, for all dg! 
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Since we employ the smooth B-spline estimation rather than the penalized NPML estim ation, our 
residue error Condition M4 is much less restrictive than that in iMa & Kosorokl (12005b . and may 
apply to more general class of semiparametric transformation models. Note that Condition M4(b) 
ensures the concavity of the function s M- 8 log F(s) + (1 — 5) log(l — F(s)) for 5 = 0,1. 

It is easy to verify that the above Condition M4 is satisfied in the following two general classes 
of residue error distribution functions after some algebra. 



Fl. 



F(s) = 7[2r(7 r )] 1 exp(— |t| 7 )(it for 7 > 1 is a family of distributions, which includes 
the standard normal distribution afte r app ropriate rescaling (7 = 2). This corresponds to the 
probit model iKalbfleisch & Prentice! dl980h . 



F2. F(s) = 1 — [1 + 7e s ] 1 ' 1 is a Pareto distri bution with parameter 7 6 (0, 00 ) and corresponds 



to the odds-rate transformation family, see Dabrowska & Doksuml (|1988aUbf) . It includes the 
following two well-known special cases: 

(a) . Given 7 — > 0, it yields the extreme value distribution, i.e. F(s ) = 1 — exp(— e s ), wh ich 

corresponds to the complementary log-log transformation, see lBanerjee et al.l (|2006|) : 

(b) . Given 7 = 1, it gives the logisti c distribution, i . e. F(s ) = e s /(l + e s ), which corresponds 

to the logit transformation, see lBanerjee et al.l(|2009|) . 



2.2 B-spline Estimation Framework 

From now on, we change the signs of and hj for simplicity of exposition. In addition, we re-center 
H(v) to H(v) — H(l v ) so that H(l v ) = for the purpose of identifiability. The additional parameter 
H(l v ) will be absorbed into the vector (3, i.e., the first coordinate of z is set as one. Given a single 
observation at x = (v, 5, z, w), the log-likelihood of model (OQ) is written as 



h d ,H) 



5 log < F 



H{v) + 0'z + J2 h 



+(1 — 5) log < 1 — F 



H(v)+P'z + J2 h j( w i) 



(4) 



We assume that (3 e B, which is a bounded open subset in M!, and that its true value (3q is an 
interior point of B. Before specifying the parameter spaces for H and hj's, we first introduce the 
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Holder b all H^(y), which is a class of smooth functions widely used in the nonparametric estima- 
tion, e.g.. lStond (|1982lll985l) . For any / G H r c (y), it is J < r times continuously differentiable on 
y and its J-th derivative is uniformly Holder continuous with exponent k = r — J G (0,1], i.e., 

sup U' JI M-J'"fe"<c. 

The functions in the Holder ball can always be approximated by a basis expansion, i.e., 

K 

/W~^7A(i)=7'B(t), (5) 
fc=i 

where 7 = (71, . . . , 7^)' and B(t) = (-Bi(i), . . . , B K (t))' . Actually, if the degree d of the B-spline 
satisfies d > (r — 1), we have 

HZ-YBlUxK^ asK^oo, (6) 

where || ■ denotes the supremum norm.. 

Assume the following parameter space Condition PI for the smooth hj. 

PI. For j = 1, . . . , d and some known Cj, we assume that the parameter space for hj is Hj, where 

Hj = ^hj : hj e Hg[0, 1] with r, > 1/2 and J hj(wj)dnjj = j , 
and that the corresponding spline space is 

"Hj„ = : /ij(w) = 7jBj(w) with ||/ij||oo < and J hj(wj)dwj = 1 , 

based on a system of basis functions Bj = (Bji, . . . , Bj^,)' of degree c?j > (r 3 - — 1). 

As seen from the previous examples, it is reasonable to assume that if (•) is differentiable and 
strictly increasing over [l v ,u v ], i.e., H(v) > Co > 0. Considering that H{l v ) = 0, we can thus 
write H(v) = exp(g(s))ds, where g{v) = \ogH(v) is well defined. Such reparametrization can 
get around the strict monotonicity and positivity constraints of H, and thus avoids the constrained 
optimization in the computation. The parameter space Condition P2 for g is specified below. 
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P2. For some known c , we assume that the parameter space for g is Q, where 

g = {g : g e H r c ° o [l v ,u v ] withro > 1/2} , 
and that the corresponding spline space is 

Qn = {g ■ g{v) = 7o B oO) and WgW^ < c } 
based on a system of basis functions B = (An, • • • 5 Bok ) °f degree d > (r — 1). 

Similarly, we define Q' n = {H(v) = f. exp(g(s))ds : g G 9 n }. By some algebra, we can show that 

H E H r , 0+ \l v , uJ for some cL < 00. 



Remark 1 . Note that in the theoretical proofs and numerical calculations the exact values of 
Cj are not necessary. Instead, only the boundedness condition, equivalently the compactness of 
parameter spaces and spline spaces, is needed. Here we assume this boundedness condition, which 
can be relaxed by invoking the chaining arguments, only for simplifying our theoretical derivations. 

In this paper, we propose the B-spline approach to estimate H and h/s as follows. Let A = 
B x Q x Uj = {Hj and A n = B x Q n x U ( j =1 'Hj n . Denote a as (/?', g,h\,..., h d )' and its true value 
a as (/3q, g , h w , . . . , h d0 )', where g (-) = log H (-). The log-likelihood © for the observation i 
can thus be reparametrized as 



5ilog{F 



d 

exp(g(s))ds + hj(w, 
3=1 



fi'zi + / exp(g(s))ds 



+ (l-^)log 
The corresponding B-spline estimate a is defined as 

n 

arg max li(a) 

CtGAn Z — ' 




(7) 



a 



(8) 



i=i 



We can also write a = {^','g, hi, . . . , h d )' = (P',j'o^o, Ti^i, . . . , 7^B d )'. Then, the estimate 
H{v) = exp(^' B (s))ds. Some tedious algebra reveals that the Hessian matrix of £i(a) w.r.t. 
(/?', 7o, 7i, • • • , 7rf)' is indeed negative semidefinite under Condition M4(b) which guarantees the ex- 
istence of a. See more discussions on the computation feasibility in the simulation section. The 
above estimation procedure also applies to other linear sieves approximating the Holder ball (or 
more generally Holder space), e.g., wavelets. 
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3 Consistency and Rates of Convergence 



In this section, we show that our B-spline estimate is consistent and the convergence rate of each 
nonparametric estimate appears to interfere with each other. Define 

d 

d(a,a ) = - 00 1| + \\H - H \\ 2 + \\hj - h j0 \\ 2 , 

i=i 

where || • || 2 is the L 2 norm. Now we give the main Theorem of this section. 

Theorem 1. Suppose that Conditions Ml -M4 and PI -P2 hold. IfKj/n — > Oforj — 0, 1, . . . , d, 
then we have 

d(a,a ) = op(1). (9) 
More specifically, we further prove that 



d(a, a ) = P I max ji^. J V y% /n j J . (10) 

If we further require that Kj x n 1 ^ 2 ^ 1 ) for j = 0, . . . , d, then we have 

d(a,ao)=0 P (n- r « 2r+ V), (11) 

where r = min <j< d {rj}. 



According to Theorem [Q the smooth H can achieve the fast er convergence rate, i.e ., O p 



n 



-r/(2r+l)^ 



than n^-rate derived in the penalized estimation context, see iMa & Kosorokl (|2005|) . when we as- 
sume that g and hjo's are all at least continuously differentiable, i.e., r > 1. More importantly, 
we can furth er show that H is uniformly consistent, i.e., \\H — ifolloo = op(l), by applying 
Lemma 2 in Ichen & Shenl d 1998b that WfW^ < WfW^Z^ for any / £ W c [a, b} d and noting that 



if, H e H r ? +1 [^, u v ] for some Cn > 0. 

The above theorem also holds when we employ the constrained monotone B-spline to approx- 
imate H , i.e., 7gB (t>) w \ogH(v) with 7oi < 702 < • • • < Jok - However, such constrained 
optimization usually requires additional computational effort, see Zhang et al. (2010). 

Remark 2. From the above TheoremU} we observe the interesting convergence rate interfere 
phenomenon, i.e., the convergence rate for each B-spline estimate is forced to equal the slowest one. 
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In Wla & KosoroM 6200$) . they also show that the convergence rate of the penalized estimate h is 
unfortunately slowed down to Op(n -1 / 3 ) by the NPMLE H regardless of the smoothness degree of 
h . One possible solution in achieving the op timal rate for each nonparametric estimate is to extend 
the most recent mixed rate asymptotic results \Radchenko\ n2008() to the semiparametric setup. 

Since we assume that r > 1/2, the convergence rate given in (fTTI ) is always op(n -1 / 4 ). Such a 
rate is usually fast enough to guarantee the regular asymptotic behavior of 0, i.e., A/n-consistency 
and asymptotic normality. Indeed, we will improve the current suboptimal rate of f3 in (fTTj) to the 
optimal rate, and further show that (3 is semiparametric efficient in next section. 



4 Weak Convergence of the Parametric Estimate 



In this section, we study the weak convergence of the spline estimate (3 in the presence of multiple 
nonparametric nuisance functions. We first calculate the semiparametric efficient information based 
on the projection onto the nonorthogonal sumspace. 
Let 

5 1-5 \ 



Qe{x) = f{9) 



F{6) l-F{6))' 

where 9(z, v, w) = (3'z + H(v) + Ylj=i hj ( w j)- Denote 9q as the true value of 9. The score functions 
(operators) for (5, g and hj are separately calculated as 

if){X;a) = ZQ e (X), (12) 



e g [a](X;a) = / exp(g(s))a(s)ds Q g (X), (13) 

Jlv 

t h .[b j ]{X;a) = b 3 (W 3 )Q e (X). (14) 

We assume that a 6 L 2 (H) = {a : J" 1 " a 2 (s)dH(s) < oo} and bj G L^{wj) = {bj : f bj(wj)dvjj = 
and Jq b 2 j{wj)duij < oo} so that all the score functions defined above are square integrable. 

To calculate the efficient score function £p, we need to find the projection of tp onto the sumspace 

A = A g + A hl + ■ ■ ■ + A hd , where A g = {£ g [a] : a e L 2 (H)} and A h = {£ h [bj] : bj E L° 2 (wj)}. 



For simplicity, we define £p(X; a ) an d £p(X] a) as 
applies to £ g [a](X; a) and £ hj [bj](X; a). We define 



■Pa 



and t-s, respectively. The same notation rule 



£ P (X; a) = 4(X; a) - 4[a f ](X; a) - £)i h . [b]](X; a), 
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where a) 



■ ■ ■ ,&i 



7 and ti = {bl,...,bl)'. And( 



for k = 1, . . . , I. Similarly, denote tp(X\ a ) and £p(X; a) as £p and £«, respectively. By taking the 
two-stage projection approach from Sasieni (1992), we have 



E{QUX)\V) 



(15) 



where &t(W) = £j =1 satisfies 



<4M^) = o 



(16) 



fo r every € Lo(w,), ? = 1, . . . , d and = 1, ...,/. By slightly modifying the proof of Lemma 4 



in 



Ma & Kosorokl (12005b . we can show that the above non orthogonal p r ojecti on is well defined and 



fet(-) exists by the alternating projection Theorem A.4.2 in lBickel et al.l (|1993|) . 
Define EL, and U a as the projection operators 



U jg h-> 
respectively. Define 

D(v,w) = 
T(wi,Wj) = 



E\g(V,W)Q 2 go \W j = w. 
E[CX\W j = w j ] 



E[g(V,W)Q 2 0o \V = v 
E[Ql\V = v\ 



E[ZQl\V = v,W = w] _ E[Ql\V = v,W j = w j ] 

E[Ql\V = v,W = w} ' HV ^- 



E[Ql\W i = w i ,W j =w j 
E[Ql\W 3 = Wj ] 



U{wj,v) 



E[Ql\W 3 =w 3 ] 
E[Ql\W =w v V 
E[Ql\V = v] 



We say a function f(s, t) belongs to a uniform Holder ball H r c (S x T) in t relative to s if it is J < r 
continuously differentiable w.r.t. t and its J-th partial derivative satisfies, with k = r — J, 



sup sup — < c. 



sG5 t\j4a 



W — to \ K 
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Define Sf(v,wj) = S{v ) w j )f v \ Wj {v,w j ),Tf{w i ,w j ) = T(w i ,w j )f Wi \ w .(w i ,w j )andUf(w j ,v) = 
U(wj,v)f w .\ v (wj,v), where f v \ Wj , fw l \w J an d fw 3 \v are the conditional densities of V given W jf 
Wi given Wj and Wj given V w.r.t. Lebesgue measure, respectively. 

Here, we assume some model assumptions implying that both b^ k and a\ belong to some Holder 
balls for any j — 1, . . . , d and k — 1, . . . , I. 

M5. We assume that \RjD{v } w)] k £ [0, 1], Sf(v, wf) £ H^([Z„, w„] x [0, 1]) in Wj relative to 
v and Tf(wi, Wj) £ [0, l] 2 in iOj relative to u>; for some < Cj < oo and j = 1, . . . , d. 

M6. We assume that [U a D(v,w)) k £ H r 5 l +1 {l v ,u v ] and Uf(wj,v) £ Hg +1 ([0, 1] x [/„,«„]) in u 
relative to Wj for some < c < oo. 

Note that we can simplify Sf(v,Wj) (T f{w il Wj)) to S(v,Wj) (T(wi,Wj)) in Condition M5 and 
simplify Uf(wj,v) to U(vjj,v) in Condition M6 when we assume that V and W are independent 
and that W is pairwise independent. 

Theorem 2. Suppose that Conditions M1-M6 and P1-P2 hold. If Kj x n 1/(2r J +1) and 7 is 
invertible, then we have 

yfr@-Po) ^Vrtjli) + o P (l) -A NiOJv 1 ), (17) 
where Iq is the efficient information matrix defined as E£p Q £'n. 



5 B -spline Estimate of the Efficient Information 

In this section, we give an explicit B-spline estimate for the efficient information as a by-product 
of the establishment of asymptotic normality of j3. Indeed, it is simply the observed information 
matrix if we treat the semiparametric model as a parametric one after the B-spline approximation, 
i.e., Hj = 1-Lj n and Q = Q n . Specifically, we treat £i(a) defined in © as if it were a parametric 
likelihood £ t (fi, 70, 71, ... , 7d)- 

We construct the corresponding information estimator for (/?', 70, 71, ... , 72)': 

In h 2 \ 
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where T jtk = ^=1 A j( x u ®) A 'k( X u a)/n, for j, k — 1,2, and 
A 1 {X;a) = £p(X;a), 

A 2 (X;a) = (e g [B 01 ],...J g [B K ],ihABn],---,th d [B d K d ] 
The parametric inferences imply that the information estimator for (3 is of the form 



I — hi — h2^22^21- 



(18) 



Some calculations further reveal that 



I = Pr. 



(19) 



where [j]} Kj xi = (iji, for j = 0, 1, . . . , d and ( 7 J fc , . . . , 7 J fc ) T = l^hiU where l fc repre- 

sents the /-vector with its k-th element as one and others as zeros. We will use (fT8l as our estimator 
for I . 

We need the following additional assumption for Theorem |3] 
M7. We assume that 



E sup 



-i 2 



[exp(g(s)) - exp(g (s))]a k (s)ds 



< 



\H-H, 



M 2 - 



Theorem 3. i7nJer Conditions M1-M7 and P1-P2, we have I -> 7 - 



6 Numerical Results 



6.1 Simulations 



We perform a Monte-Carlo study to assess the finite-sample perfo rmance of our proposed method. 
To compare with the penalized NPMLE in iMa & Kosorokl (120051) . we adopt the same setting used 
in their paper. We simulate the current status data from the partly linear additive Cox model which 
is a special case of general transformation model. We choose H(u) = logv4(e") where A(u) = 
e k °(exp(u/3) — 1) with k = 0.06516. The errors e follow an extreme value distribution with 
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F(s) = 1 — exp(— e s ). The regression coefficients f3 1 = 0.3 and f3 2 = 0.25. The covariate Z\ is 
Uniform[0.5, 1.5] and Z 2 is Bernoulli with success probability 0.5. We choose W as Uniformfl, 10] 
and h(w) = sin(u>/1.2 — 1) — k . Censoring times are standard exponential distribution conditional 
on being in the interval [0.2, 1.8]. The sample sizes are n = 400 and n = 1600. We simulate 400 
realizations for both sample sizes. 

In practice, the numbers of knots for H and hj need to be determined. Common variable selec- 
tion methods such as the Akaike information criterion (AIC), and the Bayesian information crite- 
rion (BIC) can be employed for selecting the optimal number of knots. In this paper, we determine 
K , Ki , . . . , K d by the AIC given by 

n d 

AIC = -2 + 2(^+5^ Kj) 

i=l j=0 

In our simulation, we use a quadratic spline to approximate both function h and function g in H. 
Then, AIC = —2 Y^h=i + 2(K + Ki + 2). Based on our experiences, it is generally adequate 
to choose less than ten knots to achieve reasonable approximation, provided that h and H are not 
overly erratic. Figure [T| shows the AIC scores under different combinations of K and K\ for one 
realization of the simulation with the sample size n = 1600. It shows that the optimal choices for 
Kq and K\ are 5 and 5, respectively. The estimated h and H with various values of K and K\ 
are plotted in Figure |2] In the left panel of Figure |2J we fix K = 5 and plot the estimated h with 
K\ = 3, 5, 10. When K\ is small (e.g., K\ = 3), there seems be to a big bias in our estimator. On 
the other hand, when K x is large (e.g., K x = 10), the estimator displays a wiggly behavior. In the 
right panel of Figure|2l we fix K\ = 5 and plot the estimated H with Kq = 5, 7, 10. As the number 
of knots is increasing, the estimated H shows a similar wiggly shape. Hence, the numbers of knots 
should be chosen with caution. 

Simulation results show that our B-spline estimation procedure performs quite well in the semi- 
parametric transformation model. The bias and standard errors of the spline estimates of 0i and 02 
are given in Table [T] The table shows that the sample biases of both 0i and (3 2 are small. The ratio of 
the standard errors for the two sample sizes is close to 2, a result consistent with a -y/n-convergence 
rate for (3i and (3 2 . The estimated standard errors from (fl~8l (denoted as ESD) are also displayed 
in Table [H which are very close to the simulation results. Although our proposed method tends to 
overestimate the standard error slightly but the overestimation lessens as sample size increases. The 
95% confidence interval constructed from (fl8l generally have coverage close to the nominal value. 
Histograms of (3i and f3 2 are shown in Figure |3] It is clear that the marginal distributions of f3\ and 
P> 2 are Gaussian. The left panel of Figure |4] displays the spline estimate of h(w) and the monotone 
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Table 1 : Monte Carlo results for the partly linear Cox model with current status data based on 400 
replicates 







Sample size 400 


Sample size 1600 


01 


Bias 


0.0318 


0.0100 




SD 


0.2919 


0.1246 




ESD 


0.3102 


0.1325 




Coverage 


0.9620 


0.9690 


% 


Bias 


0.0168 


0.0074 




SD 


0.1533 


0.0797 




ESD 


0.1612 


0.0803 




Coverage 


0.9710 


0.9680 


Joint 


Coverage 


0.9620 


0.9550 



SD: Standard error; ESD: Estimated standard error 



estimate H is given in the right panel of Figure |4] The dashed line is the true function, the solid 
line is the average estimate over 400 realizations, and the dash-dotted line is the 95% pointwise 
confidence band for h(w) or H(v) when we know the true model, which is obtained by taking 2.5 
percentile and 97.5 percentile of these 400 estimates at each w or v. 

To compare our spline based method with the penalized method in lMa & Kosorok! (120051) . there 
are four obvious advantages of our meth od. First, the computat ional cost of our spline estimate H 
is much less expensive than that used in Ma & Kosorokl (|2005h . i.e. the cumulative sum diagram 
approach. This is because the number of basis B-splines (thus the number of knots), e.g., K — 5 
and K\ = 5, is often taken much smaller than the sample size n, thus the dimension of the estimation 
problem is greatly reduced. Secondly, our estimate of the transformation function H is smooth with 
a higher convergence rate. We obtain a narrower confidence interval for H shown in the right panel 
of Figure |4] Thirdly, we can obtain an expli cit consistent estimate I. However, the block jackknife 
approach proposed in IMa & Kosorokl (12005b is not theoretically justified. At last, we do not require 
the constrained optimization in our implementations. 
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Histgram of R Histgram ot B ? 




-0.2 -0.1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 



Figure 3: Histogram of (3i and f3 2 based on 1600 samples and 400 replicates. 




Figure 4: Left: Estimate and pointwise confidence interval for h. Right: Estimate and pointwise 
confidence interval for H. The solid line is the average estimate over 400 realizations from sample 
size n = 1600, and the dashed line is the true function. The dash-dotted lines are the 95% pointwise 
confidence interval. 
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Table 2: The estimates and their corresponding estimated standard errors for the parametric part for 
the calcification data 





extreme value distribution 


logistic distribution 


ft 


-0.1870 


-0.2562 


ESD(ft) 


0.2322 


0.2119 


h 


0.3502 


0.3573 


ESD(/3 2 ) 


0.3481 


0.3280 



ESD: Estimated standard error 



6.2 Application: Calcification data 



We illustrate the proposed method in a dataset from the calcification study. Yu et al. (2001) in- 
vestigated the calcification of intraocular lenses, which is an infrequently reported complication of 
cataract treatment. Understanding the effect of some clinical variables on the time to calcification 
of the lenses after implantation is the objective of the study. The patients were examined by an 
ophthalmologist to determine the status of calcification at a random time ranging from zero to thirty 
six months after implantation of the intraocular lenses. The severity of calcification was graded 
into five categories ranging from zero to four. In our analysis, we simply treat those with severity 
> 1 as calcified and those with severity < 1 as not calcified. This dataset can be treated as the 
current status dataset because only the examination time and the calcification status at examination 
are available. The interesting covariates include Z x incision length, Z 2 gender (0 for female and 1 
for male), and W age at implantation/ 10. The original dataset has 379 records. We remove the one 
re cord with missing measurement , result ing t he sample s i ze n = 3 78. This dataset has been studied 



by IXue et all (120041) . lLam & Xud (120051) . and lMal (12009I) . Kue et all (120041) and lLam & Xud (120051) 
modeled the event time directly and did n ot us e any transformation. A straightforward estimation of 
the hazard function is not available. iMal (|2009|) used the cure model to fit the data, and assumed a 
generalized linear model for the cure probability. For subj ects not cured, the linear and partly linear 
Cox proportional hazards models are used to model the survival risk. 

We fit this dateset using the semiparametric additive transformation model. We assume the error 
distribution F to be one of the two distributions: extreme value distribution and logistic distribution. 
We approximate h and log H by quadratic splines. The optimal choices of knots for h and log H are 
6 and 5, respectively. The estimates and their corresponding estimated standard errors for the para- 
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5 5.5 6 6.5 7 7.5 8 5 10 15 20 25 30 35 

Age/10 Time 



Figure 5: The spline estimates of h{w) and H(v) under two different assumptions of the error 
distribution: extreme value distribution (solid) and logistic distribution (small dashes). 

metric part are summarized in Table |2] The estimates for h(w) based on different error distributions 
are displayed in the left panel of Figure HI and the estimates of H(v) are plotted in the right panel 
of Figured The analysis shows very similar results for these two error distributions. From Table |2] 
both incision length and gender are insignificant at the 5% level of significance. From the left panel 
of Figure |51 h(w) increases steadily from age 50, achieving a peak at age 60, decreasing gradually 
thereafter, which means that patients ages around 60 tend to enjoy a longer time to calcification. The 
estimated transformation function H in the right panel of Figure |5]displays a nonlinear behavior and 
it shows that the transformation is necessary. 

We can incorporate an unknown scale parameter into to the residual error distribution F(-) to 
further improve the above analysis. Our general B-spline estimation framework can also handle this 
type of transformation models easily. 

Acknowledgement 

The first author's research is supported by the National Science Foundation under grant DMS- 
0906497. The second author's research is supported by the National Science Foundation under 
grant CMMI- 1030246 and DMS- 1042967. The authors would like to thank Professor Alexis K. F. 



18 



Yu for providing the Calcification data and thank Professors Michael Kosorok and Donglin Zeng for 
many helpful comments and suggestions to improve the paper. 



Appendix 



Some useful Lemmas 

We define e-covering number (e-bracketing number) as N(e, A, d) (N B (e, A, d)). The correspond- 
ing e-entropy (e-bracketing entropy) is defined as H(e, A, d) = \ogN(e,A,d) (H B (e,A,d) = 
logN B (e,A,d)). Define G n (5 ; || • ||) = {g : g(v) = 7o B o(^) satisfying \\g\\ < 5 } and Uj n {8j\ \\ ■ 
||) = {hj : hj{wj) = r yj'Bj(wj) satisfying \\hj\\ < 5j and J 1 hj(wj)dwj = 0}. Obviously, 
G n (c ; || • | |oo) = Q n and T-Lj n (cj] \\ ■ ||oo) = T-ij n . Lemma [j] follo ws from the B -splin e approxima- 
tion property ©. Lemma |2] is directly implied by Lem ma 2.5 in (I Van de Geei . 2000 ). Lemma |4] is 
adapted from Proposition 1 in (|Cheng & Hu ang. 



Lemma 1 . There exist g n e Q n and hj n G Hj n such that 



\h 



jn 



- 9o\\oc 
Ho\\oo 
hjO || oo 



d d 
3=1 3=1 



where Hjv) = f£ exp(g n (s))ds. 
Lemma 2. 



K n \ 
O(K 

K7 r \ 



O max {K- rj } 
\j=i,...,d 3 



(A.l) 
(A.2) 
(A.3) 

(A.4) 



H(e,U jn {bj\ 



< 



K log(l + 48 /e), 
^■log(l + 4^/e) 



(A.5) 
(A.6) 



for 1 < j < d. 

Lemma 3. Let h = (hi,..., h d ). Define JC - 
where the form of( is defined in We have 



{((/3,h,H) : p e B,h e YlU n 3n,9 e Gn}, 



sup|G n C| = P (.max {K 1 ' 2 }). 



Ce/C 



j=0,l,...,d 



(A.7) 
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Proof: Define l*{(3,h,H) = 5F((3'z+Y / d =1 h 1 (w 1 )+H(v)) + (l~5)[l-F^'z+J2 d =1 h j (w 1 ) + 
H(v))]. The construction of /*(•) implies that 

\\l*tf ,h n ,H n )-l*(p ,h ,H )\\ oo = O( max {Kp}) (A.8) 

j=a,i,...,d 

based on (|A.2I) . (IA.4I) and M4. Thus, 1*(/3q, h n , H n ) is bounded away from zero for sufficiently large 



n. 



For any f3 1 , f3 2 G B, h 1 , h 2 G [7j=i ^jn and #i, tfe G £„, we have 

\C^i,h h H l )-C(^M,H 2 )\ 



< 



f^M,^) -l*(P2,h 2 ,H 2 ) 

d 

< 11/3 



\ ~ ^2 1| + ^ \\hij - h 2j \\ac + ||flfi - ^lU- (A.9) 
3=1 

The first and second inequalities in the above follow from the fact that l*(/3o,h n , H n ) is strictly 
positive for sufficiently large n by (IA.8I) . and Condition M4(a), respectively. As shown in (IA.9I ), the 
functions in the class / C are Lipschitz continuous in ( (3, h, g). Therefore, by combining Lemma [2] 
and Theorem 2.7.1 1 in (IVan de Geer & Wellnen 1 19961) . we obtain that 

H B (e, K, L 2 {P)) < max {Kj} log(l + M/e), 

0<j<d 



where M = max < J < d {4c :( }. In the end, we apply Lemma 3.4.2 in (IVan de Geer & Wellneii Il99q) 
to this uniformly bounded class of functions /C to obtain (IA.7I ). □ 

Lemma 4. Suppose the following Conditions (B1)-(B3) hold. 

Bl. = op^n- 1 ' 2 ), P„4[at] = o P (n- l l 2 ) andFJ^b]} = op^- 1 / 2 ); 

B2. sup {a:d(aiao) < Cl „-,/(2 r +i )} GnfyX; a) - tp(X\ a )) = o P {\); 

B3. P{l p {X-a)-Ip{X-a )) = -7 (/3- ft) +o(\\ (3- /3 ||) + o(^ 1/2 ) for a satisfying d(a,a ) < 

c , in -r/02r+l) > 

7/" a is consistent and Iq is invertible, then we have 

V ^ ■ 1 
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Lemma 5. (i) If a(s,t) = a(si,s 2 ,t) G H£(<Si x S 2 x T) in t relative to s 1 and s 2 , ?/ien 
J 5 a(si, s 2 , t) cfei G H^(5 2 x T) in t relative to s 2 . 

(7/J 7jfa(s, i), 6(s, t) 6 H'(5 x T) in i relative to s, ?/zen c(s, £) = a(s, i)&(s, t) G H£,(<S x T) in 
t relative to s. 

(Hi) Ifa(s, t) G H r c (S x T) in t relative to s and /(•) G then f(a(s, t)) G H r c ,(S x T) in £ 
relative to s. 

Proof: Let [r\ be the largest integer smaller than r. Denote the m-th derivative of a(s, i) w.r.t. t 
as _D™ a ( s > t) for m = 0, 1, . . . , [r\ • 

(i) Note that _D™a(s 1? s 2 , t) is bounded for < m < [r\ , by the dominated convergence theorem, 
we can take derivative inside the integral to obtain 

DT(J s a(si,S2,t)<fei) = £) t ro a(si,S2,*)dsi, 
which implies that D^(J Si a(si, s 2 , t) dsi) is bounded for < m < |_?"J • Using this and the fact that 

I A W U Sl a ( s i' s 2- *2) dsi) - A W (J 5l a(si, s 2 , *i) etei) | 



|t 2 -tx| r -W 

< / sup sup , , , dsi < c < cx), 

J5l si,s 2 ti^t 2 l r 2 — tl\ L J 

for all s 2 and ti ^ t 2 , we conclude that f s a(si, s 2 , £) cfei G H£,(<S 2 x T) in t relative to s 2 for some 
d < oo. 

(ii) The result is true because 

i+j=m 

is bounded for < m < \r\ . Also we note that for i < [r\, 

\Dia(s,t 2 ) - Djafati)] _ I g g^M) g| 

|t 2 -tl| r -W |t2-tl| r_LrJ 

It can then be easily verified that 

sup sup 1 1 \: 2) - . * , < °°- 
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(iii) When < a < 1, the result follows from the observation that 

f(a(s,t 2 )) - /(a(s,*i)) /(a(s,t 2 )) -/(a(s,*i)) |a(s, t 2 ) - a(s, *i)| 



1*2 — *l| 



a(s, t 2 ) — a ( S ; t 



Using the chain rule, the above observation and part (ii) of the lemma, the desired result can be 
obtained by induction for general (3. □ 
Denote 

d 

S k (X;a,w k ) = [£/3(X;a)] k - £ g [a k ](X;a) - y~] £ h . [b jk ] (X; a), 

3=1 

where w k = (a k , b lk , b dk ). Let W„ = Q n x [X/=i %jn and jV = {«ei: d(a, a ) = o(l)}. 
LEMMA 6. t/nJer Conditions M1-M7 & P1-P2, we have 

E sup \S k (X;a,w k ) - S k (X;a ,w k )\ 2 ^ d 2 (a,a ) (A.10) 

/or a// a G A/o and A; = 1, . . . , I. 

Proof: In view of (fT2l)-(fl4l) , we can bound the left hand side of (IA.10D by 



< 



WQo ~ Qe \\l + E \ sup 



(exp(g(s)) - exp(g (s)))a k (s)ds 



(<9e - Qe 



+E sup 



n 2 



exp(g (s))a k (s)ds(Q e - Q 8o ) 



I v 



V l2 

(exp(g(s)) - exp(g (s)))a k (s)dsQ do 



+E sup 

a k eg n Ui v 

d 

+ Y,E sup [b%(Q e - Q d0 ) 2 } 

after some algebra. The compactness of Q n and %j n imply that the third and fifth term in the above 
are both of the order \\Qg — Q 9o |||. For the second term, we can further bound it by 



E 



sup / a 2 k (s)ds [exp(g(s)) -exp(g (s))] 2 ds(Q e -Q do ) 

OfcGffn J lv Jlv 
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Considering the compactness of Q and Q n , we know the second term is also of the order \\Qo — Qe Q Hi- 
Assumption M4(a) together with Cauchy-Schwartz inequality implies that \\Q e — Qo \\% £ \\j3 — 
f3 \\ 2 + \\H — H \\l + || ^2j =l {hj — hjo)\\l. Since we assume that the density for W is bounded 
away from zero and infinity, we have that || Y^j=i(hj ~ hjo) III ~ X)j=i II ~~ ^io||| considering the 
identifiability condition J hj(wj)dwj = 0. Assumption M7 implies that the fourth term is of the 
order \\H — i?o||!- Considering the form of d(a, a ), we conclude the whole proof. □ 



Proof of Theorem 3] 



Recall that h = (hi, . . . , ha)- Denote h , h n and h as the corresponding true value, B-spline 
approximation and sieve estimate, respectively. Recall that l*(0o, h n , H n ) is bounded away from 
zero for sufficiently large n as implied by (IA.8I ). Then, by the definition of a, we have 

¥ n log{l*0,h,H)/l*(p o ,h n ,H n )} > 0, 

which implies that, by the inequality that a log(x) < log(l + a(x — 1)) for any x > and a E (0, 1), 



0<P n log 



1 + a 



l*(/3 ,K,H n ) 



- 1 



= F n ((/3,h,H). 



(A.ll) 



Lemma[3]implies that (F n —P)((f3, h, H) = op(l) since Kj/n = o(l) for any j = 0, 1, . . . , d. Thus, 
P((fi,h,H) > op(1) based on (lATTTT) . LetU n (X) = l*(p,h,H)/l*(po,K,H n ). Based on (lA^T) 
we know PU n (X) = 1 + op(l), which further implies P(((3, h, if) < op(l) by the concavity of 
s ^ log(s). This in turn implies that P(0, h, H) = o P (l). This forces P\(f3' Z + fy„(W,-) + 
#n(V)) - (P'Z + J2 d j=i hj(Wj) + H(V))\ = op(l) by the strict concavity of s ^ log s, Conditions 
M4(a), PI and P2. It is easy to verify that ER 2 n = o P (l) if E\R n \ = o P (l). Thus, we further have 

P^@-Po)'Z + Yffij - h jn )(W 5 ) + H(V) - H n (V)^ = op(l). 

Combining the above equation with the identifiability condition M3 , we can show (f3 — /3 ) = o P ( 1) . 
This, in turn, implies that 

P j^fe - hjnXWj) + H(V) - F n (V) j = op(l). 
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Since we assume that the joint density of (V, W) is bounded away from zero in M2(b), we have 

2 

/l pi pu v ( d ^ ^ V~ 

"J J i j^-^' _ hjri)(wj) + H(v) - H n (v) I dvdw l ---dw d = o P (l). 

Considering that hj(wj)dwj = for hj E Hj U Hj n and that the joint density of (V, W) is 
bounded away from infinity, we have Y^j=i ll^i — fyjnlh + \\H — H n \\2 = op(l). The spline approx- 
imation result (IA.21 ) and (IA.3I ) conclude the proof of d9). 



We next verify the conditions of Theorem 3.2.5 in IVan de Geer & Wellnerl (|1996|) to establish 



the convergence rate result (ITTV Recall that 9(z,v,w) = pz + H(v) + E J= i hjiwj). Denote 
9 = z'(3 + H(v) + Y^j=i hj( w j) as its sieve estimate. Following similar arguments in proving the 



consistency, it suffices to show that 

\\9-9 Q \\ 2 = P {n~ r ^ + ^), 
where r = min < j<d{rj}. We first need to show that 

P[£(a )-£(a)}>\\9-9 Q § 



(A.12) 



(A.13) 



for every a in the neighborhood of a . Define q(5, t) = 5\og(F(t)) + (1 — 5) log(l — F(t)) and 
q(S, t) as its second derivative w.r.t. t. Since a maximizes a i— >■ P£(a), we have 



P[£{a )-£{a)] = P 



-me) 



do) 



where 9 is on the line segment between 9 and 6q. The compactness of the parameter spaces imply 
that P[£(ao) — £(a)] x \\9 — 0o|||- This completes the proof of (|A.13I) . We next calculate the order of 
£ , sup || fl _ fl o || 3 ^ /i ; \G n (£(a) — £(a n ) )\ as a function of 5, denoted as 4> n (8), by the use of Lemma 3.4.2 
of lvan de Geer & WcllnciW 199(1 Let T ln {5) = {£{a)-£{a ) : g E G H in , ||6> — <9 1 1 a < 5}. 

Using the same argument as that in the proof of Lemma |3] we obtain that H B (e, F\ n (5) , L 2 (P)) is 
bounded by C maxo< j<d{Kj} log(l + 8/e). This leads to 

J B (S,J r i n (S),L 2 (P)) = / v/1 + H B (e,F ln (5),L 2 (P))de < C max{ v /^}5. 

JO 0<j<o! 

The com pactness of Q n and i mpli es the uniform boundedness of any / E F\ n {5). Thus, Lemma 
3.4.2 of IVan de Geer & Wellnen (|1996|) gives (j) n (5) = max <j< d {^yK]}5 + max <j< d {Kj} / y/n. 
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By solving £ ln 2 0„(5i n ) < y/n, we get 

5 ln = 0{max{JlC j }/^i). (A.14) 

0<j<d 

In the end, we show that ¥ n £(a) — ¥ n £(a ) > — 0p(5f n ), where 5 2n = max <j< d {Kj Tj }. The 
definition of a implies that 

F n [£(a) -£(a Q )] > A n + B n , 

where A n = (P n - P){£((3 , H n , h n ) - £(a )} and 5 n = P{£([3 , H n , h n ) - £{a )}. A straightfor- 
ward Taylor expansion gives 

A n = (P„ - P) ji 2 (/? ,F n ,h n )(# n - flo) +^4 + ,(/3 ,F n ,h n )(V - Mj , 

where £ t is the Frechet derivative of £(/3 , H n , h n ) w.r.t. the t-th argument. Considering (IA.2K (IA.31 ) 
and the fact that < €\ < \q(S, t) \ < e 2 < oo for t in some compacta of M 1 , we have 



£ 2 (Po, H n , K,)(H n - H ) + Yfj=i 4+j(A), H n , h n )(h jn - h j0) 



2 



P < — — f ^0 (A.15) 

for any e > 0. Let 7 2n = {£(/3 , H, h) - £(a ) : g e G n , hj G H jn , \\g - g \\oo < C a K Q n \ \\hj - 
^iolloo < QKj Tj }- Similar analysis in Lemma|3]show that the bracketing entropy integral (in terms 
of L 2 (P)) for Tm is finite, thus y ields that T 2n is P-Donsker. Co mbining this P-Donsker result and 



(IA. 151) . we use Corollary 2.3.12 of lVan de Geer & Wellnerl(|1996|) to conclude that y/nA n / (max <j< d {Kj T ' J }n t 



op(l). By choosing some proper < e < 1/2 satisfying n 6 ^ 1 ^ 2 = m.ax$< j< d {K~ r 3 }, we have 
A n = op(maxo<j< d {KJ' 2r: ' }). We can also show B n > — 0{jnaxQ<j< d {K~ 2r3 }) by similar analysis 
of (IA.13I) . This shows that 

5 2n =max{Kp}. (A. 16) 

0<j<d J 

Therefore, we have that d(a, a ) = P (5i n V 5 2n ), i-e-, (flOl) - which directly implies (fTTT > by 
choosing Kj x n l ^ 2r ^ +1 \ □ 

Proof of Theorem |2] 

We apply Lemma |4] to prove this theorem. We first check Condition Bl. Obviously, = 

since {3 maximizes l(/3, 'g, hi, . . . , h d ), /3 is consistent and f3 is an interior point of B. Following the 
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analysis in Page 2282 of iMa & Kosorokl d2005h . we can write, with a\(v) = C exp(<7 (s))o* (s)ds 



= UjD(v,w)- a\(v)Sf(v,Wj)dv — b\{wi)T f{w h Wj)d 

Jlv A-LA JO 



According to Lemma [5] and dominated convergence theorem, we know that b^ k {wj) E H~ [0, 1] 

under Condition M5, b^ jk E L 2 {w 3 ) and a k E L 2 {H) (thus a\ k is uniformly bounded) for some 
< Cj < oo. Then, for each b^ jk , there exists a bj kn E Hj n such that 

l|6}*-6}jloo = 0(n-^ +1 )) (A.17) 

by © and the assumption that K 3 x n l ^ 2r ^ +l \ 

Since P„^ .[bjkn] — for any b jkn E Hj n , it suffices to show that 

Pn - %H k }} = o P (n-^). (A.18) 
We can decompose the left hand side of (IA.18I) as I\ n + I 2n , where 

hn = P{kA n -b] k ]-£ h Jb] kn -b] k ] 



l 2n 



By Cauchy-Schwartz Inequality, we have I\ n ^ \\b\- n — b\ } : ||oo||# — flolh based on Conditions M4(a), 
PI & P2. Thus, (IATT21) and (1AT7T) imply that I ln = Op(n~ 2r ^ 2r+1 ^) = o P {n- 1 / 2 ) since r > 1/2. 
Define Av(#) = {ct G A n '■ d(a,a ) < Ci5} and H' jn {8) = {b jkn E H jn : ||^ fc „ - fctj^ < C 2 5} 
for some < Ci, C 2 < oo. As for the term I 2n , we first consider the following class of functions: 

T n = \^h 3 [b, jkn - b] k ](X; a):aE A n (n^i) and b jkn E H' jn {n^)^ . 

For simplicity, we write the function in X n as f d:bjkn (x). Let 6„(5) = {f3'z + H(v ) + hj(w 3 ) : 
a G ^4 n (5)}. It is easy to verify that, for every x, 

l./W„lO) - fe2,b jkn2 {x)\ ~ ||^l - ^2||oo + \\bjknl ~ b jk n2\\oo, (A.19) 
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where 6j G Q n {n- r ^ 2r+1 ">) for j = 1, 2. Let 1 , . . . , 9 N ^ e ^ n r/(2r+1) ).IHI~) and ^ . . . ; b jh 
be the e-cover for Q n (n~ r ^ 2r+1 ^) and l-i'j n {n~ T: >^ 2ri+1 ^), respectively. Thus, we can construct the 
bracket \f gi h i — 2Ce, f gi h i + 2Ce] covering X„. The bracket size is ACe. Hence, we obtain 

' jkn * jkn *~" 



N(e,H' jn (n- r J^ 2r J +1) ),\\-U) 



H B (e,X n ,L 2 (P x )) 

< H(e/(AC),e n (n^ 
max {KA logfl + n 

0<j<d J 



-r/(2r+l) 



)+H(e/(4C),H' jn (n> 



based on Lemma[2l We next apply Lemma 3.4.2 in lVan de Geer & Wellneri(|1996|) to show i?||G n || x n = 
o(l) which yields 7 2n = o P {n~ l l 2 ). We first calculate the ^-bracketing entropy integral 

r 5 

J B (5,l n ,L 2 (P x ))= / y/l + H B (e,l n ,L 2 (P x )) = maxi^jn-^5 1 / 2 . 

JO 0<j<d 

Note that ||/|| 2 ^ \\bjkn ziAzMll2 an d ILfLa < Hfyfcn — ||oo fc> r any / G X„, and thus 5 and M in 



Lemma 3.4.2 of 
Lemma 3.4.2 of 



Van de Geer & Welln er ( 1 996}) are both chosen as K,- r \ i.e., n r v/( 2r j+ 1 ). Then, by 



Van de Geer & Wellnerl (|1996|) and some algebra, we have that 



E\\G n \\ In = 0\n 



We have thus verified that P n ^.(fet) = o P (n~ 1/2 ). 



We next show that 7?J-g[a ] 



-l/2^ 



op{n -'"j by similar arguments. Similarly, we have 



a „1 

a|(v) = n a D(f,w) - V" / b](wj)Uf(wj,v)dwj. 

Recall that a}(t>) = exp(^o(s))a^(s)(is. Under Condition M6 and the assumption that g G 
H^[i„,ii u ], we can show that a} fc G H~ 0+1 [i„, tt„], which implies that a\ G H~°[i„,tt„] for some 
< Co < oo, based on Lemma [5] We next show that I[ n = op(n - 1 ' 2 ) and 

Pin = op(n 1 ' 2 ), where 



I'm = PUgl* 



kn a k\ ^go[ a kn a k\ f ' 



p 



2n 



P) 



'9i a kn a k\ f • 
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andc4 n G Q n satisfies \\a\ 
Inequality, we can show that 



kn a k\\oo 



< 



a f -a f ll I 

u kn "fcllool 



= O(K r °) for any k — 1, . . . , I. Similarly, by Cauchy-Schwartz 

Ooh + P / (exp(^) -exp(g )){s)(al n -al)(s)ds 



< 



< 



\ a kn a k\\oo 



\@ — ^0 || 2 + \\H — H \\ 2 
0p ( n -/(2r+D) = 0p ( n -V2) 



by choosing Kj x n 1 /( 2r i+ 1 ) > Following similar arguments in analyzing I 2n , we can show that 
I' 2n = opfn" 1 / 2 ). Thus, we have ve rified Condition B 1 in Lemma @] We again apply Lemma 
3.4.2 of IVan de Geer & Wellnerl (|l996|) to verify Assumption B2. The details are skipped due to the 
similarity of the previous analysis. 

It remains to verify Assumption B3. This can be easily established using the Taylor expansion 
in Banach space. However, we first need to reparameterize the efficient score function £p(X; a) as 



£p(X;a*) = ZQ e (X)- / a\s)dH(s) + J^feJ^ 
= £p{X;a*)-i v [<*\(X;a*), 



Qe{X) 



where a* = ({3, H,hi,..., hd), r] = {H, hi, . . . , h d ) and c 1 = (a 1 ", b\,...,P d ). We first derive two 
useful equalities (IA.23I) - (IA.24I) . Let E a * be the expectation corresponding to the reparametrized 
likelihood under the parameter a*. Since E a *£p(X; a*) = 0, we have 



d_ 

dt 



U=o 



E at £ f3 {X;a$) = 0, 



(A.20) 



where a* t = a* Q + te. Define £p : p and £p tTI [c] as the first derivative of £p w.r.t. and r\ (along the 
direction c), respectively. By setting e = (e'g, 0, . . . , 0)' and e = (0,e)' = (0, AH, bi, . . . , b d )' , 
respectively, some calculations reveal that 



E [£p,p(X; a* )epj + E a*)£' p (X; a* )ep 

E {lp, v [e](X; a*)\+E {lp(X; a* )Qe](X; a*) 



0. 




(A.21) 
(A.22) 



based on (1A.20I) . By considering the orthogonal property of £p and the above reparametrization, we 
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obtain the following two useful facts: 



i = -E\e M (x ]a * ) 

E\l p>v [e](X;a* ) 







(A.23) 
(A.24) 



based on (EOTT) and (EA221) . 

Define £p !a *,a*[hi, h^{X\ a*) as the second order Frechet derivative of £g w.r.t. a* along the 
direction [hi,h 2 ] at the point a*. The same notation rule applies to £p )a *, a *[hi,h2]{X;a*) and 
£r),a*,a* [hi, h 2 , h 3 ](X; a*). Now we are ready to express the Taylor expansion as follows. 

E[£j(X;a)-I^X;a )\ 
= E[£p(X;a*)-Ip(X;a*)] 

= E ft j/3 pT; oQ\ (P -(3 q )+E \I^[ v - r, Q ](X; a* ) 



+-E{lp, a * ta *[Aa*,Aa*]{X;a*)} 
= -Io(P-Po) 

+ \e {^^[Aa*, Aa*](X; a*) - t Vt<x ,^\ Aa*, Aa*](X; 5*)} , 

where Aa* = a* — ckq and a* lies between a* and 0%. The last equation in the above follows from 
(|A.23I) & (|A.24I) . Now we only need to show that the second term in the last equation is of the order 

0(11)9 -AlD+oCn-V*). 

Let AH = H — H and Ahj = hj — hj . After some algebra, we obtain 

ip >a *, a *[Aa*,Aa*](X;Z*) 

d 

= ZQ~ e Z'(P - p ) + AH(V) + J2 Ah ^ 
ir,, a *,a* [c\Aa*,Aa*](X; a*) 

V d 



a* 



lv 



3=1 



Qe 



Z'{/3 - fa) + Aff(V) + Ah AWj 



3=1 



+2 



a ] (s)dAH(s) 



Z'((3 - fa) + AH(V) +Y,*h j (W j 

3=1 
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where 9 lies between 9 and 9 . Considering the assumption that d(a, a ) < C\rr T l { 2 rJrV) and the 
previously shown result that a\ and b^ k are both uniformly bounded, we can verify Assumption B3 
based on the above expressions. This completes the proof of Theorem |2] □ 



Proof of Theorem |3] 



For simplicity, we write S k (X; a , w k ) and S k (X; a, w k ) as S^tUfe] and S k [w k ], respectively. Based 
on the definitions of I and (fT9l ), we know their (k, k')-th entry can be written as 



I (k,k f ) = ES k [w k ]S° k ,[wU, 
T(k,k') = Fj k [w k ]S k ,[wl,], 



(A.25) 
(A.26) 



where w 



• > h \k) and w\ = ((tJJ'Bq, (7i f fc )'Bi, . . . , (l\ k )'B d ). It is easy to show that 



E 



sup \S k (X;a,w k )\' 
aeAf ,-w k eWn 



< const. < oo 



(A.27) 



since A and W k are both assumed to be compact. Note that (IA.27I ) implies that {S k (x; a, w k ) : a G 
Aq, w k G W n } is P-Glivenko-Cantelli. Then, we know that, uniformly over w k , w k / G W n , 



f n Sk[w k ]S k '[w k/ } 
ES k [w k ]S k >[w k >) + op(l) 



(A.28) 



by considering Corollary 9.27 of iKosorokl (|2008|) . Uniformly over w k , w k / G W„, we have 

ES k [w k ]S k >[w k >] - ES k [w k }S k ,[w k i 



< E 



S k [w k }(S k >[w k >] - S k ,[w k >]) + E S k ,[w k >](S k [w k ] - S k [w k ]) 



< \\Sl[w k ]\\ 2 \\S k <[w k >] - 5*°,[u; fc /]||2 + ||5 , fe'[wfc']|| 2 ||*S'ifc[w fc ] - ^[wa,.]^ 

< o P (l), (A.29) 



where the last inequality follows from (1A.101 ) (together with the consistency of a) & (IA.27I ). Com- 
bining (IA.28I ) and (IA.29I ), we have obtained that 



sup 

W k ,W k ,&V n 



F n S k [wk\S k ,[w k >] - ES k [w k }S k/ [w k >} = op(l) 



(A.30) 
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which implies that 



I(k,k') = ES k [wl]S°Awl]+o P (l). 



(A.31) 



To finish the proof, we need to introduce w\ = argmin UIfcgWn E{Sl[w k ]} 2 as a bridge. Now, it 
remains to show that 



ES° k [wl\S o k/ [wU-I (k,k') = o(l). 



(A.32) 
(A.33) 



We first consider (IA.321 ). By similar analysis applied to (IA.29I ), we know that (IA.32I ) holds if 



k[ w k ]\\ 2 = o P {\). Denote M n (w) and M(w) as P„S£H and ||5jgH||f, respectively. 



The definition of w\ further implies that 



Il^[^]ll2-Il^°[^]l| 2 



2- 



= P„^[^]-||5 fe °[^]|| 2 2 + 0p (l), 

= M n (wl)-M(4)+o P (l), 
where the second equality follows from (1A.301 ). By the definitions of w\ and w\, we have 

M n {w\) - M{w\) < M n {w\) - M{w\) < M n (wt) - M{w\). 

Therefore, we conclude the proof of (|A.32b by applying (IA.30I) to the above inequality. We next 
consider (IA.331 ). Again, by the form of Io(k, k') given in (IA.25I ) and similar analysis in (IA.321 ), we 
only need to show 



kl w k\ 



< 



< 



< 



inf E 



inf 



kl w l\\U — By the definitions of w\ and w\, we have 

d 

U4] - 4k] + Efeofe] - ^M) 
d 

ll4o[4]-4K]ll2+Ell^o[^]-4, [6 

d 

»'f nu4l - 4oK]ii2 + - thj> 

a k \\ 



VJII2 



inf \\a\ 



I oo 



7 = 1 
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where the last inequality trivially follows from the form of £ g [a] and £h [bj]- According to the 
analysis in the proof of Theorem |2] we know that a\ G H~°[l v ,u v ] and b^ k G H~ 3 ,[0, 1]. Thus, we 
have ||<Sj![iu£] — ^[w^l^ — > based on the last inequality in the above. This completes the whole 
proof. □ 
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